{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# LAB 03.01 - Model Generation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget --no-cache -O init.py -q https://raw.githubusercontent.com/fagonzalezo/ai4eng-unal/main/content/init.py\n",
    "import init; init.init(force_download=False); init.get_weblink()\n",
    "init.endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from local.lib.rlxmoocapi import submit, session\n",
    "session.LoginSequence(endpoint=init.endpoint, course_id=init.course_id, lab_id=\"L03.01\", varname=\"student\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.datasets import make_moons\n",
    "from local.lib import mlutils\n",
    "from IPython.display import Image\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A machine learning task\n",
    "\n",
    "We have two species of bugs (**X bugs** and **Z bugs**), for each bug we have measured its **width** and **length**. Once we have a bug, determining if is of  **species X** or **species Z** is very costly (lab analysis, etc.)\n",
    "\n",
    "**Machine learning goal**: We want to create a model so that, when given the width and length of a bug, will tell us whether it belongs to  **species X** or **species Z**. If the model performs well, we might use it insted of the lab analysis.\n",
    "\n",
    "**To train a machine learning model** we built a **training dataset** where we have **annotated** 20 bugs with their **confirmed** species. The training dataset has:\n",
    "\n",
    "- 20 data items\n",
    "- two data columns (**width** and **length**)\n",
    "- one label column, with two unique values: **0 for species X**, and **1 for species Z**.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(20, 3) (20, 2) (20,)\n",
      "[[0.5  0.65]\n",
      " [0.75 0.34]\n",
      " [0.37 0.5 ]\n",
      " [0.57 0.74]\n",
      " [1.   0.69]]\n",
      "[0. 1. 1. 0. 1.]\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>width</th>\n",
       "      <th>height</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.50</td>\n",
       "      <td>0.65</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.75</td>\n",
       "      <td>0.34</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.37</td>\n",
       "      <td>0.50</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.57</td>\n",
       "      <td>0.74</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.00</td>\n",
       "      <td>0.69</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   width  height    y\n",
       "0   0.50    0.65  0.0\n",
       "1   0.75    0.34  1.0\n",
       "2   0.37    0.50  1.0\n",
       "3   0.57    0.74  0.0\n",
       "4   1.00    0.69  1.0"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "d = pd.read_csv(\"local/data/trilotropicos_small.csv\")\n",
    "X,y = d.values[:,:2], d.values[:,-1]\n",
    "print (d.shape, X.shape, y.shape)\n",
    "print (X[:5])\n",
    "print (y[:5])\n",
    "d.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since it is just two columns, we can visualize it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAcVklEQVR4nO3df5DV9X3v8ed7+SHFRVRo99Is+6O3JNVYjSxX0wFvIMYGmbkwpOiQrFa96iY4oGPUe3FoNNI6d5qmwYkxNYtxomHrSrRa2mDRmt0YKmT4oaUBg0OQXXdMy88qK1mE7vv+8T0Lh+Xs7nf3nM/59X09Zs6c8/1+P3z3/eHsnvf5/Ph+vubuiIhIclUUOgARESksJQIRkYRTIhARSTglAhGRhFMiEBFJuNGFDmC4Jk+e7HV1dVmf58MPP+Tcc8/NPqASofqWN9W3vOWivtu2bTvo7r+d6VjJJYK6ujq2bt2a9Xna29uZPXt29gGVCNW3vKm+5S0X9TWzjoGOqWtIRCThlAhERBJOiUBEJOFKboxARGQwJ06coKuri56enkKHkjMTJ07krbfeilV23LhxVFdXM2bMmNjnVyIQkbLS1dXFhAkTqKurw8wKHU5OHD16lAkTJgxZzt05dOgQXV1d1NfXxz6/uoZEpKz09PQwadKkskkCw2FmTJo0aditISUCESk7SUwCfUZSdyUCEZGEC5YIzOxJM9tvZr8Y4LiZ2bfNbI+Z7TCz6aFikeLX0gJ1dVBRET23tBQ6IpGReffdd6mvr+fw4cMAHDlyhPr6ejo6zryea9++fVxyySWFCPEsIVsEPwDmDnL8WmBa6tEE/E3AWKSItbRAUxN0dIB79NzUpGQgpWnq1KksWbKE5cuXA7B8+XKampqora0tcGQDC5YI3P014PAgRRYAT3tkM3C+mU0JFY8UrxUr4NixM/cdOxbtFwktRGv07rvvZvPmzTzyyCNs3LiRe+65J2O5kydPctNNN3HppZeyaNEijqX+EOrq6jh48CAAW7duZd68eQAcOHCAa665hunTp/PlL3+Z2traU+WyUcjpox8D3k3b7krt+3X/gmbWRNRqoKqqivb29qx/eHd3d07OUyqKub7Llg18bKQhF3N9Q1B9T5s4cSJHjx6NdZ61a0ezbNk4fvObaIC1owNuv93p6enh+utPZhXjQw89xBe+8AVefPFFjh8/zvHjx8+qw+7du3n00Uf5zne+wx133MGqVau48847cXe6u7s555xz+PDDD3F3jh49yooVK5g5cyb33HMPr7zyCs3NzafKpevp6Rne74O7B3sAdcAvBjj2Y2BW2varQMNQ52xoaPBcaGtry8l5SkUx17e21j3qFDrzUVs78nMWc31DUH1P27VrV+zzhPjd63PXXXf5lClT/Fvf+lbG4++8845PnTr11Parr77qCxYsSMVV6wcOHHB39y1btvisWbPc3f2yyy7zvXv3nvo3F1xwwaly6TL9HwBbfYDP1ULOGuoCpqZtVwPvFSgWKaCHH4bx48/cN358tF8kpM7O4e2P68033+SVV15h8+bNrFq1il//+qyODuDsqZ5926NHj6a3txfgjGsCos/z3CtkIlgH/Glq9tCngffdPfP/lpS1xkZobobaWjCLnpubo/2FollMyVBTM7z9cbg7S5Ys4ZFHHqGmpob77ruPe++9N2PZzs5ONm3aBMAzzzzDrFmzgGiMYNu2bQA8//zzp8rPmjWLtWvXAvDyyy9z5MiRkQeaJuT00WeATcAnzKzLzG41s6+Y2VdSRdYDe4E9wGrgjlCxSPFrbIR9+6C3N3oudBLQLKZkCNEaXb16NTU1NVxzzTUA3HHHHfzyl7/kpz/96VllL7roIp566ikuvfRSDh8+zJIlSwB48MEHueuuu7jqqqsYNWrUqfIPPvggL7/8MtOnT+ell15iypQpsZaeGNJAfUbF+tAYwWlr1kR9mWbR85o1A5cth/oORzb1DdlvHIre39OGM0bgPry/o0L54IMP3N29p6fHT5w44e7ur7/+ul922WUZyw93jECLzpWovm+tfdMu+761QmG/TZeDUP3GUpwaG0vnb6azs5Prr7+e3t5exo4dy+rVq3NyXiWCEjXY3PtS+aUuVjU1UWLNtF+kkKZNm8Ybb7yR8/NqraESpW+t4WgWkySNEkGJCjHbQSLFOItJJCQlghKlb61hFdMsJpHQlAhKlL61ikiuKBGUMH1rFSk+L7zwAp/61KfOeFRUVPDSSy+dUa6YlqHWrCERkRxauHAhCxcuPLXd3NxMS0sLn//85wsY1eDUIhCRZAu4nsjbb7/NypUr+eEPf0hFxdkft8WyDLUSgYgkV8D1RE6cOMGXvvQlvvnNb1IzwHS+3bt309TUxI4dOzjvvPP47ne/O+g5H3roIT772c+yfft2Fi5cSGeO5osrEYhIcgW8K9LXvvY1PvnJT7J48eIBy0ydOpWZM2cCcMMNN7Bx48ZBz7lx48ZT55s7dy4XXHBB1nGCxghEJMkCXZnZ3t7O888/z/bt2wctp2WoRUQKLcCVmUeOHOGWW27h6aefHnJl0LJfhlpEpOgFuDLz8ccfZ//+/SxZsuSMKaTPPvvsWWWLZRlqdQ2JSHL1XXyzYkXUHVRTEyWBLC7Kuf/++7n//vuHLFdXV8euXbsyHrvqqqt4++23T2333YN54sSJbNiwgdGjR7Np0yba2trOul/xSCgRiEiyldA61FqGWkQk4bQMtYhITKFm15SCkdRdiUBEysq4ceM4dOhQIpOBu3Po0CHGjRs3rH+nriERKSvV1dV0dXVx4MCBQoeSMz09PbE/3MeNG0d1dfWwzq9EICJlZcyYMdTX1xc6jJxqb2/n8ssvD3Z+dQ2JiCScEoGISMIpEYiIJJwSgcQScMl2ESkwDRbLkPqWbO9brbdvyXYomQsyRWQQahHIkAIu2S4iRUCJQIYUaMl2ESkSSgQypABLtotIEQmaCMxsrpntNrM9ZrY8w/EaM2szszfMbIeZzQsZj4xMgCXbRaSIBEsEZjYKeAy4FrgY+KKZXdyv2J8Ba939cmAxMPidm6UgGhuhuRlqa8Esem5u1kCxSLkIOWvoCmCPu+8FMLNWYAGQficGB85LvZ4IvBcwHslCCS3ZLiLDFDIRfAx4N227C7iyX5mvAy+b2TLgXOBzAeMREZEMLNRSrWZ2HfB5d78ttX0jcIW7L0sr89VUDH9tZn8EfB+4xN17+52rCWgCqKqqamhtbc06vu7ubiorK7M+T6lQfcub6lveclHfOXPmbHP3GZmOhWwRdAFT07arObvr51ZgLoC7bzKzccBkYH96IXdvBpoBZsyY4bNnz846uPb2dnJxnlKh+pY31be8ha5vyFlDW4BpZlZvZmOJBoPX9SvTCVwNYGYXAeOA8llEPMm0JoVIyQjWInD3k2a2FNgAjAKedPedZrYS2Oru64B7gNVmdjfRwPHNnsTbCpUbrUkhUlKCrjXk7uuB9f32PZD2ehcwM2QMUgCDrUmhRCBSdHRlseSe1qQQKSlKBJJ7WpNCpKQoEUjuaU0KkZKiRCC5pzUpckITryRfdGMaCUNrUmRFE68kn9QiEClCuhmQ5JMSgUgR0sQryafEJYK+ftdt29TvKsVLE68knxKVCPr6XTs6ou2+flclAyk2mngl+ZSoRKB+VykVmngl+ZSoWUPqd5VSoolXki+JahGo31VE5GyJSgTqdxUROVuiEkF6vyuo31VEilyepjkmaowATve7trfDvn2FjkZEZAB5vLw8US0CEZGSkcdpjkoEIiLFKI/THJUIRESKUR6nOSoRiIgUozxOc1QiEBEpRnmc5qhEIJIDuomMBNHYGE1vbGiIngPNdU/c9FGRXNNNZKTUqUUgkiUtZiilTolAJEtazFBKnRKBSJa0mKGUOiUCkSxpMUMpdUoEIlnSTWSk1GnWkEgO6CYyUsrUIsgTzTMXkWKlFkEeaJ65iBSzoC0CM5trZrvNbI+ZLR+gzPVmtsvMdprZ34aMp1A0z1xEilmwFoGZjQIeA64BuoAtZrbO3XellZkG3A/MdPcjZvY7oeIpJM0zF5FiFrJFcAWwx933uvtHQCuwoF+Z24HH3P0IgLvvDxhPwWieuYgUM3P3MCc2WwTMdffbUts3Ale6+9K0Mi8CbwMzgVHA1939nzKcqwloAqiqqmpobW3NOr7u7m4qKyuzPk8chw9H4wK9vaf3VVRE0wwvvDAvIeS1vsVA9S1vqu/wzZkzZ5u7z8h40N2DPIDrgCfStm8EHu1X5h+BF4AxQD1RF9L5g523oaHBc6GtrS0n54lrzRr32lp3s+h5zZq8/vi817fQVN/ypvoOH7DVB/hcDTlrqAuYmrZdDbyXocxmdz8BvGNmu4FpwJaAcRWE5pmLSLEKOUawBZhmZvVmNhZYDKzrV+ZFYA6AmU0GPg7sDRiTiIj0EywRuPtJYCmwAXgLWOvuO81spZnNTxXbABwys11AG3Cfux8KFZOIiJwt6AVl7r4eWN9v3wNprx34auohIiIFEDsRpK4LqEr/N+6umfAiIiUuViIws2XAg8B/AH2TIB24NFBcIiKSJ3FbBHcBn1D/vYhI+Yk7WPwu8H7IQEREpDAGbRGYWd8g7l6g3cx+DBzvO+7u3woYm4iI5MFQXUMTUs+dqcfY1AOiMQIRESlxgyYCd38IwMyuc/cfpR8zs+tCBiYiIvkRd4zg/pj7RESkxAw1RnAtMA/4mJl9O+3QecDJkIGJiEh+DDVG8B6wFZgPbEvbfxS4O1RQIiKSP4N2Dbn7v7r7U8Dvu/tTaY+/89TNZETypqUF6uqimznU1UXbIpK1uBeUbTez/rOE3idqLfyFLjST4FpaoKnp9M2fOzqibdD63iJZijtY/BLwY6Ax9fgH4GfAvwM/CBKZSLoVK04ngT7HjkX7RSQrcVsEM919Ztr2v5nZv7j7TDO7IURgImfoHGB9w4H2i0hscVsElWZ2Zd+GmV0B9N1AU7OHJLyamuHtF5HY4iaC24AnzOwdM9sHPAHcbmbnAv8vVHAipzz8MIwff+a+8eOj/SKSlVhdQ+6+BfhDM5sImLv/Z9rhtUEiE0nXNyC8YkXUHVRTEyUBDRSLZC3u/QjOAf4EqANGmxkA7r4yWGQi/TU26oNfJIC4g8V/TzRddBtpq4+KiEjpi5sIqt19btBIRESkIOIOFr9uZn8YNBIR0cXTUhBxWwSzgJvN7B2iriED3N11z2KRHNHF01IocRPBtUGjEJFBL55WIpCQYnUNuXsHMBX4bOr1sbj/VkTi0cXTUiixPszN7EHg/3L6ZjRjgDWhghJJIl08LYUS91v9QqJ7EnwI4O7vcfp+xiKSA7p4WgolbiL4yN2d1A3rU0tLiEgONTZCczPU1oJZ9NzcrPEBCS/uYPFaM/secL6Z3Q78b2B1uLBEkkkXT0shxB0s/ibwHPA88AngAXd/NGRgIiI5pYs0BhS3RYC7vwK8EjAWEZEwdJHGoAZtEZjZUTP7IMPjqJl9MNTJzWyume02sz1mtnyQcovMzM1sxkgqISIyKN3hblCDtgjcfcQzg8xsFPAYcA3QBWwxs3XuvqtfuQnAncDPR/qzREQGpYs0BhXyorArgD3uvtfdPwJagQUZyv058A2gJ2AsIpJkukhjUBbNCg1wYrNFwFx3vy21fSNwpbsvTStzOfBn7v4nZtYO3OvuWzOcqwloAqiqqmpobW3NOr7u7m4qKyuHLlgmVN/ypvoO4fDhaFygt/f0voqKaI7uhRfmPsAcy8X7O2fOnG3unrn73d2DPIDrgCfStm8EHk3brgDagbrUdjswY6jzNjQ0eC60tbXl5DylQvUtb6pvDGvWuNfWuptFz2vW5DiqcHLx/gJbfYDP1dizhkagi2h9oj7VwHtp2xOAS4D21B3P/huwzszme4ZWgYhIVnSRxoBCjhFsAaaZWb2ZjQUWA+v6Drr7++4+2d3r3L0O2AwoCYiI5FmwRODuJ4GlwAbgLWCtu+80s5VmNj/Uz801XYMiIuUuZNcQ7r4eWN9v3wMDlJ0dMpaR0DUoIpIEuqfAIHQNiogkgRLBIHQNiogkgRLBIHQNikgAGngrOkoEg9CNQkRyrG/graMD3E8PvCkZFJQSwSB0oxCRHNPAW1EKOmuoHOgaFJEc0sBbUVKLQETyRwNvRUmJQETyRwNvRUmJQETyRwNvRUljBCKSXxp4KzpqEYiIJJwSgYhIwikRiIgknBKBiEjCKRGIiCScEoGISMIpEYiIJJwSgYhIwikRiIgknBKBiEjCKRGIiCScEoGISMIpEYiIJJwSgYhIwikRiIgknBKBiEjCKRGIiCScEoGISMIpEYiIJFzQRGBmc81st5ntMbPlGY5/1cx2mdkOM3vVzGpDxiMiImcLlgjMbBTwGHAtcDHwRTO7uF+xN4AZ7n4p8BzwjVDxiIhIZiFbBFcAe9x9r7t/BLQCC9ILuHubux9LbW4GqgPGI/nU0gJ1dVBRET23tBQ6IhEZgLl7mBObLQLmuvttqe0bgSvdfekA5b8D/Lu7/0WGY01AE0BVVVVDa2tr1vF1d3dTWVmZ9XlKRV7re/gwdHRAb+/pfRUVUFsLF16YlxD0/pY31Xf45syZs83dZ2Q6NjqrMw/OMuzLmHXM7AZgBvCZTMfdvRloBpgxY4bPnj076+Da29vJxXlKRV7rW1cXJYL+amth3768hKD3t7ypvrkVMhF0AVPTtquB9/oXMrPPASuAz7j78YDxSL50dg5vv4gUVMgxgi3ANDOrN7OxwGJgXXoBM7sc+B4w3933B4xF8qmmZnj7RaSggiUCdz8JLAU2AG8Ba919p5mtNLP5qWJ/BVQCPzKzN81s3QCnk1Ly8MMwfvyZ+8aPj/aLSNEJ2TWEu68H1vfb90Da68+F/PlSII2N0fOKFVF3UE1NlAT69otIUQmaCCTBGhv1wS9SIrTEhIhIwikRiIgknBKBiEjCKRGIiCScEoGISMIpEYiIJJwSgYhIwikRiIgknBKBiEjCKRGIiCScEoGISMIpEYiIJJwSgYhIwikRiIgknBKBiEBLS3Sv6YqK6LmlpdARSR7pfgQiSdfSAk1NcOxYtN3REW2D7imREGoRiCTdihWnk0CfY8ei/ZIISgQiSdfZObz9UnaUCESSrqZmePul7CgRiCTdww/D+PFn7hs/PtoviaBEIJJ0jY3Q3Ay1tWAWPTc3a6A4QTRrSESiD3198CeWWgQiIgmnRCAiknBKBCIiCadEICKScEoEIiIJp0Qgkkn6ImyTJ0cPLcgmZUrTR0X6678I26FDp49pQTYpQ0FbBGY218x2m9keM1ue4fg5ZvZs6vjPzawuZDwisWRahC2dFmQ7k5awLnnBEoGZjQIeA64FLga+aGYX9yt2K3DE3X8fWAX8Zah4RGKLs9iaFmSL9LWeOjrA/XSLScmgpIRsEVwB7HH3ve7+EdAKLOhXZgHwVOr1c8DVZmYBYxIZWpzF1rQgW0RLWJcFc/cwJzZbBMx199tS2zcCV7r70rQyv0iV6Upt/ypV5mC/czUBTQBVVVUNra2tWcfX3d1NZWVl1ucpFarvMBw+HH2z7e3NfLyiIlqP58ILRx5gjhXs/d22beBjDQ3Bfqx+n4dvzpw529x9RsaD7h7kAVwHPJG2fSPwaL8yO4HqtO1fAZMGO29DQ4PnQltbW07OUypU32Fas8a9ttbdzH3SpOhhFu1bsyYHEeZWwd7f2lr3qFPozEdtbdAfq9/n4QO2+gCfqyG7hrqAqWnb1cB7A5Uxs9HAROBwwJhE4mlshH37olbBwYPRo7c32qfZQqdpCeuyEDIRbAGmmVm9mY0FFgPr+pVZB9yUer0I+Ekqc4lIKdAS1mUh2HUE7n7SzJYCG4BRwJPuvtPMVhI1UdYB3wd+aGZ7iFoCi0PFIyKBaAnrkhf0gjJ3Xw+s77fvgbTXPURjCSIiUiBaYkJEJOGUCEREEk6JQEQk4ZQIREQSTolARCThlAhERBIu2FpDoZjZAaAjB6eaDBwcslT5UH3Lm+pb3nJR31p3/+1MB0ouEeSKmW31gRZgKkOqb3lTfctb6Pqqa0hEJOGUCEREEi7JiaC50AHkmepb3lTf8ha0vokdIxARkUiSWwQiIoISgYhI4pV9IjCzuWa228z2mNnyDMfPMbNnU8d/bmZ1+Y8yd2LU96tmtsvMdpjZq2ZWW4g4c2Wo+qaVW2RmbmYlPeUwTn3N7PrUe7zTzP423zHmUozf5xozazOzN1K/0/MKEWcumNmTZrY/dS/3TMfNzL6d+r/YYWbTc/bDB7qHZTk8iG6I8yvg94CxwL8CF/crcwfweOr1YuDZQscduL5zgPGp10vKvb6pchOA14DNwIxCxx34/Z0GvAFckNr+nULHHbi+zcCS1OuLgX2FjjuL+v5PYDrwiwGOzwNeAgz4NPDzXP3scm8RXAHscfe97v4R0Aos6FdmAfBU6vVzwNVmZnmMMZeGrK+7t7n7sdTmZqJ7SZeqOO8vwJ8D3wB68hlcAHHqezvwmLsfAXD3/XmOMZfi1NeB81KvJ3L2fdFLhru/xuD3bF8APO2RzcD5ZjYlFz+73BPBx4B307a7UvsylnH3k8D7wKS8RJd7ceqb7laibxilasj6mtnlwFR3/8d8BhZInPf348DHzexfzGyzmc3NW3S5F6e+XwduMLMuorshLstPaAUx3L/v2ILeqrIIZPpm33++bJwypSJ2XczsBmAG8JmgEYU1aH3NrAJYBdycr4ACi/P+jibqHppN1Nr7mZld4u7/GTi2EOLU94vAD9z9r83sj4jugX6Ju/eGDy/vgn1WlXuLoAuYmrZdzdlNx1NlzGw0UfNysOZZMYtTX8zsc8AKYL67H89TbCEMVd8JwCVAu5ntI+pXXVfCA8Zxf5//3t1PuPs7wG6ixFCK4tT3VmAtgLtvAsYRLdBWjmL9fY9EuSeCLcA0M6s3s7FEg8Hr+pVZB9yUer0I+ImnRmZK0JD1TXWVfI8oCZRy/zEMUV93f9/dJ7t7nbvXEY2JzHf3rYUJN2txfp9fJJoQgJlNJuoq2pvXKHMnTn07gasBzOwiokRwIK9R5s864E9Ts4c+Dbzv7r/OxYnLumvI3U+a2VJgA9EMhCfdfaeZrQS2uvs64PtEzck9RC2BxYWLODsx6/tXQCXwo9SYeKe7zy9Y0FmIWd+yEbO+G4A/NrNdwH8B97n7ocJFPXIx63sPsNrM7ibqJrm5VL/ImdkzRF16k1NjHg8CYwDc/XGiMZB5wB7gGHBLzn52if6fiYhIjpR715CIiAxBiUBEJOGUCEREEk6JQEQk4ZQIREQSTolAZATMbL2ZnZ9h/9fN7N7U65vN7HfTju1Lze0XKSpKBCIj4O7zYizbcDPwu0OUESk4JQKRDMzs/5jZnanXq8zsJ6nXV5vZmvRv92a2IrVm/j8Dn0jtW0S0llOLmb1pZr+VOvUyM9tuZv9mZn+Q/5qJnE2JQCSz14CrUq9nAJVmNgaYBfysr5CZNRBdjX458AXgfwC4+3PAVqDR3T/l7r9J/ZOD7j4d+Bvg3nxURGQoSgQimW0DGsxsAnAc2ESUEK4iLRGktl9w92Pu/gFnr4XT39+lnb8upxGLjFBZrzUkMlLufiK1YuktwOvADqLF3P478Fb/4sM4dd9qr/+F/v6kSKhFIDKw14i6b14jagV8BXiz36JmrwELzey3Uq2H/5V27CjRUtgiRU2JQGRgPwOmAJvc/T+IbnWZ3i2Eu28HngXeBJ7vd/wHwOP9BotFio5WHxURSTi1CEREEk6JQEQk4ZQIREQSTolARCThlAhERBJOiUBEJOGUCEREEu7/AzVWxzWTcdfKAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "\n",
    "plt.scatter(X[y==0][:,0], X[y==0][:,1], color=\"blue\", label=\"X bug\")\n",
    "plt.scatter(X[y==1][:,0], X[y==1][:,1], color=\"red\", label=\"Z bug\")\n",
    "plt.xlabel(\"width\");plt.ylabel(\"length\"); plt.legend(); plt.grid();\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 1. Manually use a predictive model\n",
    "\n",
    "We give you a procedure somewhat calibrated so that, given a new bug, it produces a prediction. The procedure depends on two parameters $\\theta_0$ and $\\theta_1$. Given the width $w^{(i)}$ and height $h^{(i)}$ of bug number $i$, the prediction $\\hat{y}^{(i)} \\in \\{0, 1\\}$ is computed as follows:\n",
    "\n",
    "$$\\hat{y}^{(i)} = 0\\text{ if }w^{(i)}<\\theta_0\\text{ AND }h^{(i)}>\\theta_1;\\;\\;\\;\\;\\;\\text{otherwise }\\hat{y}^{(i)}=1$$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This can be considered as a **model template**, depending on two parameters.\n",
    "\n",
    "\n",
    "Complete **the following function** so that whenever given a `numpy` array `X` $\\in \\mathbb{R}^m \\times \\mathbb{R}^2$ containing the width and height of $m$ bugs, returns a vector $\\in \\mathbb{R}^m$ with the predictions of the $m$ bugs as described in the expression above. The parameter `t` $\\in \\mathbb{R}^2$ contains, in this order, $\\theta_0$ and $\\theta_1$\n",
    "\n",
    "Observe that your function must return a `numpy` vector of **integers** (not booleans). \n",
    "\n",
    "**CHALLENGE**: solve it with one single line of code\n",
    "\n",
    "**HINT**: use `.astype(int)` to convert a `numpy` array of booleans to integers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [],
   "source": [
    "def predict(X, t):\n",
    "    return ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "check manually your code, your predictions with the following `t` must be \n",
    "\n",
    "       [1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0]\n",
    "       \n",
    "with an accuracy of 0.75"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [],
   "source": [
    "t = np.r_[.5,.3]\n",
    "y_hat = predict(X, t)\n",
    "y_hat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.mean(y==y_hat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "observe the classification boundary that the model generates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [],
   "source": [
    "mlutils.plot_2Ddata_with_boundary(lambda X: predict(X,t), X, y); plt.grid();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and with other `t` ... which is better?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [],
   "source": [
    "t = np.r_[.5,.8]\n",
    "mlutils.plot_2Ddata_with_boundary(lambda X: predict(X,t), X, y); plt.grid();\n",
    "np.mean(y==predict(X,t))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "observe the prediction boundaries of other models. Change the `max_depth` of the decision tree to 2. Does it look familiar?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "mlutils.plot_2Ddata_with_boundary(LogisticRegression().fit(X,y).predict, X, y); plt.grid();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.tree import DecisionTreeClassifier\n",
    "mlutils.plot_2Ddata_with_boundary(DecisionTreeClassifier(max_depth=5).fit(X,y).predict, X, y); plt.grid();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.svm import SVC\n",
    "mlutils.plot_2Ddata_with_boundary(SVC(gamma=50).fit(X,y).predict, X, y); plt.grid();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**submit your answer**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "student.submit_task(globals(), task_id=\"task_01\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 2. Fit the model\n",
    "\n",
    "Given a set of annotated data $X$, $y$ and the **model template** of the previous exercise, complete the following function that returns $\\theta_0$ and $\\theta_1$ that produce the **best accuracy** on the given `X` and `y`. Consider only $\\theta_0$ and $\\theta_1$ with **one decimal number between 0 and 1**.\n",
    "\n",
    "**Hint**: use a brute force approach, consider all combinations of $\\theta_0$ and $\\theta_1 \\in$ [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]. Use [`np.linspace`](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html) and [`itertools.product`](https://docs.python.org/3/library/itertools.html#itertools.product)\n",
    "\n",
    "Your function must return an `numpy` array with two elements, the resulting $\\theta_0$ and $\\theta_1$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 202,
   "metadata": {},
   "outputs": [],
   "source": [
    "import itertools\n",
    "\n",
    "\n",
    "def fit(X,y):\n",
    "    def predict(X, t):\n",
    "        return ...\n",
    "\n",
    "    return ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 203,
   "metadata": {},
   "outputs": [],
   "source": [
    "t = fit(X,y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check your solution with the code below. The `t` returned by your function should produce an accuracy of 0.9 with the example data `X`, `y`. There might be several `t` producing the same accuracy, you just have to return any of those. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 204,
   "metadata": {},
   "outputs": [],
   "source": [
    "mlutils.plot_2Ddata_with_boundary(lambda X: predict(X,t), X, y); plt.grid();\n",
    "np.mean(y==predict(X,t))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "you can also use your model on different data. Execute the next cells several times to see the effect on different datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 229,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import make_blobs\n",
    "from sklearn.preprocessing import MinMaxScaler\n",
    "\n",
    "bX, by = make_blobs(100,n_features=2, centers=2)\n",
    "bX = MinMaxScaler(feature_range=(0.1,.9)).fit_transform(bX)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 230,
   "metadata": {},
   "outputs": [],
   "source": [
    "bt = fit(bX, by)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 231,
   "metadata": {},
   "outputs": [],
   "source": [
    "mlutils.plot_2Ddata_with_boundary(lambda X: predict(X,bt), bX, by); plt.grid();\n",
    "np.mean(by==predict(bX,bt))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**submit your answer**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 197,
   "metadata": {},
   "outputs": [],
   "source": [
    "student.submit_task(globals(), task_id=\"task_02\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 3: Make an `sklearn` compatible class with your model\n",
    "\n",
    "organize the previous methods in the following class structure. Bear in mind that:\n",
    "\n",
    "- the `fit` method now does not return `t`, which is now stored in an instance variable `self.t`\n",
    "- the `fit` method must now return `self`.\n",
    "- the `predict` method now does not accept `t` as argument, it must use the one stored in `self.t`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 294,
   "metadata": {},
   "outputs": [],
   "source": [
    "def SimpleModel():\n",
    "    class _SimpleModel:\n",
    "\n",
    "        def __init__(self):\n",
    "            pass\n",
    "\n",
    "        def fit(self, X, y):\n",
    "\n",
    "            ....\n",
    "            return self\n",
    "\n",
    "        def predict(self, X):\n",
    "            return ....\n",
    "        \n",
    "    return _SimpleModel()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 295,
   "metadata": {},
   "outputs": [],
   "source": [
    "m = SimpleModel()\n",
    "m.fit(X,y)\n",
    "m.predict(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 296,
   "metadata": {},
   "outputs": [],
   "source": [
    "mlutils.plot_2Ddata_with_boundary(m.predict, X, y); plt.grid();\n",
    "np.mean(y==m.predict(X))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "check your model with different parametrizations of the `moons` dataset (more and less data points, more and less noise)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 293,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import make_moons\n",
    "\n",
    "mX, my = make_moons(100, noise=.1)\n",
    "m = SimpleModel()\n",
    "m.fit(mX,my)\n",
    "\n",
    "mlutils.plot_2Ddata_with_boundary(m.predict, mX, my); plt.grid();\n",
    "np.mean(my==m.predict(mX))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**submit your answer**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 313,
   "metadata": {},
   "outputs": [],
   "source": [
    "student.submit_task(globals(), task_id=\"task_03\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}